Enhancing the Interoperability of iSimp by Using the BioC Format

نویسندگان

  • Yifan Peng
  • Catalina O Tudor
  • Manabu Torii
  • Cathy H Wu
  • K Vijay-Shanker
  • no
چکیده

*Corresponding author: Tel: 302 831 8496, E-mail: [email protected] ! Abstract This paper reports the use of the BioC format in our sentence simplification system, iSimp, so that it could be seamlessly used in text mining pipelines. iSimp is designed to simplify complex sentences commonly found in the biomedical text, therefore bringing benefits to existing text mining applications that rely on the analysis of sentence structures. By adopting the BioC format, we aim to make iSimp readily integrable in various applications in this domain. To examine the utility of iSimp with BioC, we designed and implemented a rule-based relation extraction system that uses iSimp as a preprocessing module and BioC format for data exchange. Evaluation on the BioNLP-ST 2011 GE task training corpus showed that, with sentence simplification provided by iSimp, the F-value of the phosphorylation extraction increased 3%. The iSimp corpus previously used for the evaluation of simplification and the GE task corpus used in the current study have been converted into the BioC format and made publicly available. ! Introduction The syntactic complexity of the biomedical text often poses a major challenge in designing and applying Natural Language Processing (NLP) systems on scientific articles. One possible approach to address this issue and improve the performance of NLP systems (e.g., relation extraction systems) is to simplify the complexity of the sentences themselves prior to using them as input in the NLP systems. For this purpose, we had previously developed iSimp [1], a sentence simplification system. Used a preprocessing module that simplifies the input text, iSimp has a potential to enhance existing text mining applications in the biomedical domain. In order to make iSimp readily integrable in various applications, we have adopted the BioC format, a simple, yet robust, XML format to share text documents and annotations [2].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

iSimp in BioC standard format: enhancing the interoperability of a sentence simplification system

This article reports the use of the BioC standard format in our sentence simplification system, iSimp, and demonstrates its general utility. iSimp is designed to simplify complex sentences commonly found in the biomedical text, and has been shown to improve existing text mining applications that rely on the analysis of sentence structures. By adopting the BioC format, we aim to make iSimp readi...

متن کامل

tmBioC: improving interoperability of text-mining tools with BioC

The lack of interoperability among biomedical text-mining tools is a major bottleneck in creating more complex applications. Despite the availability of numerous methods and techniques for various text-mining tasks, combining different tools requires substantial efforts and time owing to heterogeneity and variety in data formats. In response, BioC is a recent proposal that offers a minimalistic...

متن کامل

BioC interoperability track overview

BioC is a new simple XML format for sharing biomedical text and annotations and libraries to read and write that format. This promotes the development of interoperable tools for natural language processing (NLP) of biomedical text. The interoperability track at the BioCreative IV workshop featured contributions using or highlighting the BioC format. These contributions included additional imple...

متن کامل

BioCconvert: A Conversion Tool Between BioC and PubAnnotation

BioC is a simple XML data format for text, annotations, and relations. PubAnnotation is a repository of text annotations focused on the life science literature. A conversion tool between BioC XML and the JSON import / export format of PubAnnotation has been developed, BioCconvert. As a demonstration, the Ab3P gold standard abbreviation annotations are being made available through PubAnnotation....

متن کامل

A Visual Tool for Displaying Annotations in BioC

To support data interoperability for annotation results, we developed BioC-Viewer (http://viewer.bioqrator.org/). BioC-Viewer is a web-based interactive curation tool to visualize annotation results from text mining tools and to easily curate entities and relationships with supporting BioC format. Since our focus was on a visual tool for BioGRID curators in BioCreative V BioC Track, BioC-Viewer...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013